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THE BIAS-VARIANCE TRADE-OFF IN THOMSON’S MULTITAPER 

ESTIMATOR 


LUIS DANIEL ABREU AND JOSE LUIS ROMERO 

Abstract. At the heart of non-parametric spectral estimation, lies the dilemma known as 
the bias-variance trade-off: low-biased estimators tend to have high variance and low vari¬ 
ance estimators tend to have high bias. In 1982, Thomson introduced a multitaper method 
where this trade-off is made explicit by choosing a target bias resolution and obtaining a 
corresponding variance reduction. The method became the standard in many applications. 
Its favorable bias-variance trade-off is due to an empirical fact, conjectured by Thomson 
based on numerical evidence: assuming bandwidth W and N time domain observations, 
the average of the square of the first K = [2NW\ Slepian functions approaches, as K 
grows, an ideal band-pass kernel for the interval [-W, W]. We provide an analytic proof of 
this fact and quantify the approximation error in the L 1 norm; the approximation error is 
then used to control the bias of the multitaper estimator resulting from spectral leakage. 
This leads to new performance bounds for the method, explicit in terms of the bandwidth 
W and the number N of time domain observations. Our method is flexible and can be 
extended to higher dimensions and different geometries. 


1. Introduction 

Let / = [—1/2,1/2]. Any stationary, real, ergodic, zero-mean, Gaussian stochastic process 
has a Cramer spectral representation 

x(t) = Je^dZit), 

and the spectrum S(£), defined as 


S(0dt = E{\dZ(0\ 2 }, 

and often called the power spectral density of the process, yields the periodic components 
of x(t). The goal of spectral estimation is to solve the highly underdetermined problem of 
estimating S(£) from a sample of N contiguous observations x(0),..., x(N — 1). Embryonic 
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approaches to the problem (Stokes 1879, Shuster 1898) used the so called periodogram: 

„ N -1 


( 1 . 1 ) 


^ - N 




— 2iri^t 


t =0 


whose analysis has influenced harmonic analysts since Norbert Wiener (see 0 )- The peri¬ 
odogram can also be weighted with a data window {Dt}^ 1 , usually called a taper, giving 
the estimator: 
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( 1 . 2 ) 


Sd(Z) = 


N-l 


Y, x(t)D t e 


—2ni£t 


t =0 


The choice of the taper {.Dj }^ 1 can have a significant effect on the resulting spectrum 

estimate Sd■ This is apparent by observing that its expectation is the convolution of the 

2 

true (nonobservable) spectrum S(£) with the spectral window |J-\D(£)| = D t e~ 2m & , 


i.e., 


(1.3) 


E \S D (Z) \ = S{Z)*\FD{Z)f 


Thus, the bias of the tapered estimator, which is the difference S(£) — E{jSd(£)}> is deter¬ 


mined by the smoothing effect of {D t }^ = g 1 over the true spectrum. Ideally, the function 
should be concentrated on the interval [—^ 7 ,^ 7 ], but the uncertainty principle of 
Fourier analysis precludes such perfect concentration. Inevitably, some portion of the filter 
FD(f) will lie outside the target region and spectral leakage occurs. 

In [23], Thomson used the sequences which minimize spectral leakage to construct an 
algorithm using several tapered estimates, whence the name multitaper. In doing so, he 
was able to reduce variance by averaging, while introducing a tolerable amount of spectral 
leakage. Thomson’s multitaper method has been used in a variety of scientific applications 
including climate analysis (see, for instance [5], or [9] for a local spherical approach), and 
it was used to better understand the relation between atmospheric C0 2 and climate change 
(see [2K Section 1 ]). The method became also paramount in statistical signal analysis [XT] . 

Today, Thomson’s multitaper method remains an effective spectral estimation method. 
It has recently found remarkable applications in electroencephalography [7] and it is the 
preferred spectral sensing procedure [ 8 ] for the rapidly emerging field of cognitive radio [TO] . 
In the next paragraph we provide an outline of the essence of the method. 

Thomson’s method starts by selecting a target frequency smoothing band [-W, W\ with 
l/2iV < W < 1/2, thus accepting a reduction in spectral resolution by a factor of about 
2NW. The first step consists of obtaining a number K = |_2A^XWJ (the smallest integer not 


greater than 2NW) of estimates of the form (1.2) by setting, for every k € {0,...,/l — 1}, 
D t = v[ k \N,W), where the discrete prolate spheroidal sequences v^\n,W) are defined as 
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the solutions of the Toeplitz matrix eigenvalue equation 


N—l 


E sm ^ (t h n) »m = MN, W)v«\N, W). 


72—0 


7 t (t — n) 


The resulting tapered periodogram is then denoted by S k (C)- The second step consists of 
averaging. One uses the estimator 

K—l 


(1.4) 


?«)(?) = -f E &«)> 


K 


k =0 


which achieves a reduced variance (see [23] for an asymptotic analysis of slowly varying 
spectra and [25] fl5 j for non-asymptotic expressions). 

To inspect the performance of the estimator S(k)(0 on the spectral domain, let us consider 
the discrete prolate spheroidal functions, also known as Slepians. They are the discrete 
Fourier transforms of the sequences v[ k \N,W), denoted by U k {N,W', £), and satisfy the 
integral equation 

j-W 

/ D*(£ - tf)U k (N , W; £)dC= A k (N, W)U k (N, W] f), 


(1.5) 

where 

( 1 . 6 ) 


>-w 


Dat(o;) = 


sin Ntix 


Sill 7TX 


is the Dirichlet kernel. Observe that, according to (1.3), E{Sfc(£)} is a smoothing average 
of the unobservable spectrum by the kernel \U k (N, W\ £)| 2 . Recall that the bias of each 
individual estimate in (1.4) is given by 

(1.7) Bias (§,«)) = E{S»K)} -S(«) = S{() * \U k (N,WX)\ 2 ~ S{(). 

The optimal concentration of the first prolate function on the interval [-W, W ] leads to a low 
bias when k — 0. But since the amount of energy of U k (N, W ; £) inside [ —W ., W] decreases 
with k (because the energy is given by the eigenvalues in (1.5) and they decrease from 1 to 
0 as k approaches K), the bias increases with k. To explain the remarkable performance of 
the averaged estimator, Thomson noted the following: the expected value of the estimator 
(1.4) is given by 


K—l 


( 1 . 8 ) 

where 


E{%)«)} = 4 E E {S*«)} = S(() * k PK (N,W-,Z), 


k =0 


K—l 


fp,<(N, W;0 = i E T'-'W W < 0! ! 


k =0 


(1.9) 
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(a) Slepians Uk and their squares |t4| 2 , for k = (b) Thomson’s spectral window. 

1,5,9,19. 


Figure 1. Some Slepians and the spectral window with N = 256 and w = 0.1. 


is the spectral window of (1.8). The bias performance is due to the fact, numerically illus¬ 


trated by Thomson, that the spectral window (1.9) is very similar to a flat function localized 
on [—IF, IF] (see Figure [I]). This is an intriguing mathematical phenomenon. Heuristically, 
it requires the functions in the sequence {\Uk(N, W, -)| 2 : k = 0,..., K — 1} to be orga¬ 
nized inside the interval [—IF, W] in a very particular way: each function tends to fill in the 
empty energy spots left by the sum of the previous ones. This behavior is reminiscent of the 
Pythagorean relation for pure frequencies: sin 2 (t) + cos 2 (t) = 1. More precisely, claiming 
that the spectral window in Thomson’s method approximates an ideal band-pass kernel, 
means that the two functions 


( 1 . 10 ) 


fp K (N, W ,.) and -/lr- 


IV, IV], 


approach each other as K increases. This is indeed true and we provide an analytic bound 


for the L 1 -distance between the functions in (1.10) 


Theorem 1 (Spectral leakage estimate). Let N > 2 be an integer, W G (—1/2,1/2) and set 
K := [2NW\. Then 

log N 


(i.ii) 




[~W,W] 


< 


LHI) 


K 


The spectral leakage estimate (1.11) is precisely what we need in order to quantify Tliom- 

pag. 1062] and validate 


son’s asymptotic analysis of the bias of the multitaper estimator 
the bias-variance trade-off. This is explained in the Conclusion section. 

A relevant feature of the method introduced in this paper is its flexibility. While the 


description of each individual solution to the concentration problem in (1.5) is very subtle, 


the aggregated behavior of the critical number of solutions to (1.5) displays a simple profile. 
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A similar aggregated behavior has been investigated in [2] and numerically illustrated in 

M. 


Our analysis depends on the properties of the eigenvalues in (1.5). Similar properties have 
been recognized in the eigenvalue problem in the context of Hankel bandlimited functions 
[I]. Since the problem studied in |lj includes the one considered by Slepian in his construc¬ 
tion of 2d radial prolate functions, we expect our methodology to be applicable to spectral 
estimation problems involving 2d functions whose spectrum lies on a disk. This may have ap¬ 
plications in cryo-electron microscopy, where estimation of noise stochastics is an important 
consideration when applying PCA to microscopy images J26|. Other multitaper estimators 
include multi-window estimators for non-stationary spectrum mm and the one based on 
spherical Slepians H0. 


2. Proof of the main result 

Our proof uses tools from the Landau-Pollack-Slepian theory [201, 22, 121 JS! HHJ- We 
do not rely on special properties of the interval /, but rather on so-called trace / norm 
estimates that can be obtained in many other contexts of practical interest (e.g. m- Hence 
the flexibility of our approach. 

Let / := [—1/2,1/2] and let us denote the exponentials by e u} (x ) := e 2mxu . We will always 
let N > 2 be an integer and W G (—1/2,1/2). For two non-negative functions f,g, the 
notation / < g means that there exists a constant C > 0 such that / < Cg. (The constant 
C , of course, does not depend on the parameters N, W .) 


2.1. Trigonometric polynomials. For notational convenience, we use a temporal normal¬ 
ization that is slightly different of the one in the Introduction (this has no impact in the 
announced estimates). We consider the space of trigonometric polynomials 

Vn = Span je -iv+i fJ - : 0 < j < N — 1 j C L 2 (/). 

This is a Hilbert space with a reproducing kernel given by the translated Dirichlet kernel, 
Dtv(o; — y), x,y G I, N G FT, where L> N is given by (1.6). Note that j f |Dat| 2 = N. 

2.2. Toeplitz operators. For W G (—1/2,1/2) the Toeplitz operator is 

(2-1) H»f := P Vn (( Pp N f) ■ l[_ W) w]) , / e L 2 (J), 

where Pp N is the orthogonal projection onto Vn- When / G Vn , Hwf is simply the projec¬ 
tion of / • 1 [-w,w] into Vn- The Slepian functions {Uk(N, W ) : k — 0,..., N — 1} are the 
eigenfunctions of H^, with corresponding eigenvalues A fc = A k (N, W): 


*W 


( 2 . 2 ) 


\U k (N,W-0\ 2 d£ = \ k , 


<-w 
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ordered non-increasingly. We normalize the Slepian functions by: J j \Uk(N, IT; £)| d £ = 1. 
We will need a description of the profile of the eigenvalues of H^. 

Lemma 1. For N > 2, W e (—1/2,1/2) and K := [2VITJ : 


(2.3) 


K—l 


1 -'* E A ‘( JV ’ W 0 


k =0 


< 


logiV 
K ' 


We postpone the proof of Lemma [T| to the Appendix. The quantity on the left-hand 


side of (2.3) has been studied in [15] to qualitatively analyze the performance of Thomson’s 
method. Lemma [I] refines the analysis of Pi giving a concrete growth estimate. (See also 
the remarks after Theorem 5 in [T5j .) 


2.3. Proof of Theorem[lJ We first estimate the narrow band error. Note that Pk(N, IT; £) = 
Ek=o I U k (N, W; Ol 2 < Eto 1 l^k(N, W; £)| 2 = D N ( 0) = N. Consequently, ±p K {N, IT; £) < 


^ and, using (2.2), we can estimate: 


,-w 

I—W 


—pk(N,W ;0 - — l[_iy,w](0 




< 


»w 


>-W 


2 W 


N\ 

-jr I l[-m,IU](0 


= 2 W\^-- 1 


rW 

d£ + 

J-W 
K-l W 


~Pk{N,W;£) - — l[_m,iu](0 




2 

< — 
“ K 


K 2 IT 

K-l 


2NW 1 \ - / o 


K 


k=0 


l-W 


1 - 7? E A ‘ .< — 


k=0 


K 


thanks to Lemma [l] Now we estimate the broad brand leakage: 


I I\[-W,W] 


—p K (N, IT; £) - —l[_m,iu](0 


d£ = 


h\[-w,w) 


—p K (N ,W;£)d£ 


1 K-l 1 K-l 

5Z(1 “ Afc ) = 1 “ K Afc ’ 


K 


k =0 k =0 

so the conclusion follows invoking again Lemma [lj 


3. Conclusion 

In [23J Section IV], Thomson estimated Bias(S^K)) by using the approximation jrpK^N, IT, ■ 
Apl [-w,w]- Besides supporting that reasoning, Theorem |l] allows one to quantify the bias. 
Indeed, 


Bias(S(K){£)) 


< 


S * ~k Pk<kNi W ’ _ S * 2W 1 


-W,W] 


+ 


S - S * 


2 IT 


[~W,W] 


OO 
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and, if S' is a bounded function, then Theorem [T| implies that 


S*-p K (N,W,-)-S* 

A 


2 W 


:lr_ 


W,W] 


logiV 




The remaining term | [S' — S * Apl [-w,w] | 1^ can bounded by assuming that S is smooth. 
For example, if, as in Thomson’s work, S is assumed to be analytic (and periodic), then 
11 S' — S' * ^1 [-w,w] 11 ~ IE 2 , leading to the bias estimate: 


(3.1) 


Bias(S w (0) < W 2 + 


logiV 

K 


On the other hand, for a slowly varying spectrum S, Thomson [23] argues that 


(3.2) 


Var(S (K) «)) <4 


(see, [25], [15] or CD Section 3.1.2] for precise expressions for the variance.) Given a num¬ 


ber of available observations, the estimates in (3.1) and (3.2) show how much bias can be 


expected, in order to bring the variance down by a factor of l/K. This leads to a concrete 
estimate for the mean squared error 


(3.3) MSE(S w ) = E(S - S w ) 2 = Bias(% } ) 2 + Var (S {K) ) < W 4 + 


log 2 N 

~lc r ~ 


+ K ’ 


that can be used to decide on the value of the bandwidth resolution parameter W. 

We have thus obtained explicit bounds that allow us to quantify the bias-variance trade¬ 
off in Thomson’s multitaper method. Note that in the slowly varying regime, the error due 
to spectral leakage is largely dominated by the variance and therefore, in agreement with 
Thomson’s analysis, the mean squared error is ~ W 4 + jr. In the case of more rapidly 
varying spectra, (3.2) is no longer a valid approximation [25), T5] and the contribution of the 


spectral leakage to the mean squared error can be more significant. 


4. Appendix 


4.1. Integral kernels. The Toeplitz operator from (2.1) can be explicitly described by 
the formula 


H wf( x )= / f(y) K w( x ^y) d V: 


where the kernel K^(x,y) is 


(4.1) 


Kw( x ,y) = 


[~W,W] 


Dat(x — z)Dn{%) — z)dz. 
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4.2. An approximation lemma. 

Lemma 2. Let f : I -A- C an integrable function, of bounded variation, and supported on 
1° = (-1/2,1/2). For A > 2, let 


f*\B N \ 2 (x) = / f(y) \B N (x-y)\ 2 dy, 


x e I. 


Then 

(4.2) 


/-//* |D„| 2 


<Var(f,I) 


L l {I) 


log A 

A ' 


Remark 1. In the above estimate, Var(f,I) denotes the total variation of f on I. If 
f = J[-W,W], with W G (—1/2,1/2) , then Var (/, I) = 2 and the estimate reads 


l[-w,iu] — ^1 [-iu,iu] * |Djv|- 


< 


LHI) 


log A 
A ' 


Proof. By an approximation argument, we assume without loss of generality that / is smooth 
(see for example [SJ Lemma 3.2]). We also extend / periodically to M. Note that this 
extension is still smooth because f\I is supported on 1°. 

Step 1. Since f(x + h) — f{x) = f'(th + x)hdt , we can use the periodicity of / to 
estimate 


\\f(- + h)-f || L i (J) < 


1 r 1 / 2 



1 fl/2+th 



0 J-l/2+th 


\f(th + x)\dx \h\ dt = 

P J —1/2 

[■l f 1/2 

= / / l/'Or)! da; |/i| dt = Var(f,I) \h\ . 

J 0 J- 1/2 

Since / is periodic, the previous estimate can be improved to: 


|/ / (rr)| dx |/i| dt 


(4.3) 


ll/(- + h) - /llii(j) S Var(f.I) |sin(jrft) , ft € R. 


Step 2. We use the notation / A := f * |Djv | _ - By a change of variables and periodicity, 


,1/2 


f{x)-f (x) = — / (/(x) - /(y + x)) |Djv(— 2 /)| dy. 

7-1/2 


We can now finish the proof by resorting to (4.3): 


I up/) 


< 


< 

r^j 


< 

r^j 


< 


Var(f, /) — 


Var(/,/)- 


1-1/2 

f |sin(7rj/)| |Djv(j/)| 2 dj/ 
- 1/2 

/ 1//2 |sin(7rA?/)| 


|y| 


~dy 


Var(/,/)- 


l/ar(/,J) 


1 + 

log A 
A ' 


r-lV/2 i 

, bi* 
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□ 


4.3. Proof of Lemma [Tj We first note from (4.1) that 


(4.4) trace (i^) = / K^(x, x)dx — / / \TDn(x — y)\ 2 dydx = 2NW, 

Ji J[-wyv] Ji 


since 


fj |Djv| — N. Moreover a similar calculation gives 


trace 2 = / / 

J[-W,W] Ji 

Hence we can use Lemma [2] to conclude that 


" [—W,W] 


(y) |Djv(a; - y)\ 2 dydx. 


trace 


N\ 2 


(K) - (HU) 



rW 


J-w 


N 1 (x) 

[~W,W] v ) 


i l - w ,w] * i D ivr) 


dx 


< 


Ni '-w, w] ( x ) - (![ -w, w] * i D ^r) 0*0 


dx < C log N, 


for some constant C. Using this bound, we estimate: 

N—l K -1 N—l 

C log N > A fc (l — X k ) = Afc(l — Afc) + Afc(l — Afc) 


fc =0 


fc =0 


k=K 


K—l 


N—l 


K—l 


k =0 


— Atf-l — Afc) + (1 — Afc 

k =0 k=K 

K—l 

= \k~iK — \k-i A k + (1 — \k-i)(2NW — A * 

k=0 

K—l 

= Xk^K + 2NW (1 - A K-i) -J2 Xk 

k =0 

K—l 

= 2NW - A fc + A^_i(/i - 2NW) 

k =0 

A-l 

> K - Afc - 1- 


/c=0 


Hence, K — X k < C log A/ + 1. On the other hand Xk ~ ^ — 2A^W — K < 1. 


K—l 


Therefore, since N > 2, 


K ~ ES A, 


< log N and the conclusion follows. 
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